Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data

Background Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. Results This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines’ detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking. Conclusions This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-024-03324-5.


Fig S2 .
Fig S2.The SV length distribution in benchmark datasets includes both simulated (Sim) and real samples.

Fig S3 .
Fig S3.Location distribution of SV on the hg38 genome on the benchmark datasets of simulated (Sim) and real samples.

Fig S4 .
Fig S4.Performance of SV detection pipelines in different SV types(CSS).Precision and recall of DEL, DUP, INS, INV and BND were determined with the simulated (a(precision), c(recall)) and the real data(b(precision), d(recall)).Precision-measure and recall-measure are shown for the pipelines indicated with orange (for DEL), yellow (for INS), green (for DUP), brown (for INV) and olive(for BND)bars.Pipelines are categorized according to the alignment tools(lra, minimap2, ngmlr, pbmm2, winnowmap).

Fig S5 .Fig S6 .Fig S7 .
Fig S5.In simulated data, the correlation coefficient of F1-measure between pipelines using DUP_INS and non-DUP_INS.R1 represents the Pearson correlation coefficient, R2 represents the Spearman correlation coefficient, and R3 is the square of the Pearson correlation coefficient.The "INS.DUP" F1-measure represents the cumulative F1 scores of DEL and INS.The DUP_INS F1-measure represents the F1 scores of DUP_INS.

Fig S8 .
Fig S8.The running time and memory usage of aligners and callers in pipelines.a. represents the running time of aligners; b. represents the memory usage of aligners; c. represent the running time of callers; d. represents the memory usage of callers; e. represents the reads lengths of HG002(depth 5x).

Fig S13 .
Fig S13.Effect of support reads on recall, precision and F1 of different SV detection pipelines.a: recall; b: precision; c: F1; d: F1 scores in detailed graphs with fewer support reads.

Fig
Fig S18.Performance of SV detection pipelines in different SV types(CCS GT).Precision and recall of DEL, DUP, INS, INV and BND were determined with the simulated (a(precision), c(recall)) and the real data(b(precision), d(recall)).Precision-measure and recall-measure are shown for the pipelines indicated with orange (for DEL), yellow (for INS), green (for DUP), brown (for INV) and olive(for BND)bars.Pipelines are categorized according to the alignment tools(lra, minimap2, ngmlr, pbmm2, winnowmap).

Fig S23 .
Fig S22.The distribution of F1 scores between simulated and real data under pipelines combining strategies based on caller and aligner.